Fully Dynamic Data Structure for Top-k Queries on Uncertain Data

نویسندگان

  • Manish Patil
  • Rahul Shah
  • Sharma V. Thankachan
چکیده

Top-k queries allow end-users to focus on the most important (top-k) answers amongst those which satisfy the query. In traditional databases, a user defined score function assigns a score value to each tuple and a top-k query returns k tuples with the highest score. In uncertain database, top-k answer depends not only on the scores but also on the membership probabilities of tuples. Several top-k definitions covering different aspects of score-probability interplay have been proposed in recent past [10], [4], [2], [8]. Most of the existing work in this research field is focused on developing efficient algorithms for answering top-k queries on static uncertain data. Any change (insertion, deletion of a tuple or change in membership probability, score of a tuple) in underlying data forces re-computation of query answers. Such re-computations are not practical considering the dynamic nature of data in many applications. In this paper, we propose a fully dynamic data structure that uses ranking function PRF (α) proposed by Li et al. [8] under the generally adopted model of x-relations [11]. PRF e can effectively approximate various other top-k definitions on uncertain data based on the value of parameter α. An x-relation consists of a number of xtuples, where x-tuple is a set of mutually exclusive tuples (up to a constant number) called alternatives. Each x-tuple in a relation randomly instantiates into one tuple from its alternatives. For an uncertain relation with N tuples, our structure can answer top-k queries in O(k logN) time, handles an update in O(logN) time and takes O(N) space. Finally, we evaluate practical efficiency of our structure on both synthetic and real data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dynamic Structures for Top- k Queries on Uncertain Data

In an uncertain data set S = (S, p, f) where S is the ground set consisting of n elements, p : S → [0, 1] a probability function, and f : S → R a score function, each element i ∈ S with score f(i) appears independently with probability p(i). The top-k query on S asks for the set of k elements that has the maximum probability of appearing to be the k elements with the highest scores in a random ...

متن کامل

Ranking queries on uncertain data pdf

Top-k queries also known as ranking queries are often natural and useful in. Ing probabilistic threshold top-k queries on uncertain data.UNCERTAIN DATA MODELS W.R.T RANKING QUERIES. Uncertain attribute based on the associated discrete pdf and the choice is.observed, the semantics of top-k queries on uncertain data can be ambiguous due to tradeoffs. Whether it is better to report highly ranked i...

متن کامل

Top-k best probability queries and semantics ranking properties on probabilistic databases

There has been much interest in answering top-k queries on probabilistic data in various applications such as market analysis, personalised services, and decision making. In probabilistic relational databases, the most common problem in answering top-k queries (ranking queries) is selecting the top-k result based on scores and top-k probabilities. In this paper, we firstly propose novel answers...

متن کامل

Top-k Dominating Queries: a Survey

Top-k dominating queries combine the advantages of top-k queries and skyline queries, and eliminate their disadvantages. They return k objects with the highest domination score, which is defined as the number of dominated objects. As a top-k query, the user can bound the number of returned results through the parameter k, and like a skyline query a user-selected scoring function is not required...

متن کامل

ارائه روشی پویا جهت پاسخ به پرس‌وجوهای پیوسته تجمّعی اقتضایی

Data Streams are infinite, fast, time-stamp data elements which are received explosively. Generally, these elements need to be processed in an online, real-time way. So, algorithms to process data streams and answer queries on these streams are mostly one-pass. The execution of such algorithms has some challenges such as memory limitation, scheduling, and accuracy of answers. They will be more ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1007.5110  شماره 

صفحات  -

تاریخ انتشار 2010